Hierarchical Actor-Critic

نویسندگان

  • Andrew Levy
  • Robert Platt
  • Kate Saenko
چکیده

The ability to learn at different resolutions in time may help overcome one of the main challenges in deep reinforcement learning — sample efficiency. Hierarchical agents that operate at different levels of temporal abstraction can learn tasks more quickly because they can divide the work of learning behaviors among multiple policies and can also explore the environment at a higher level. In this paper, we present a novel approach to hierarchical reinforcement learning called Hierarchical Actor-Critic (HAC) that enables agents to learn to break down problems involving continuous action spaces into simpler subproblems belonging to different time scales. HAC has two key advantages over most existing hierarchical learning methods: (i) the potential for faster learning as agents learn short policies at each level of the hierarchy and (ii) an end-to-end approach. We demonstrate that HAC significantly accelerates learning in a series of tasks that require behavior over a relatively long time horizon and involve sparse rewards.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Actor-critic algorithms for hierarchical Markov decision processes

We consider the problem of control of hierarchical Markov decision processes and develop a simulation based two-timescale actor-critic algorithm in a general framework. We also develop certain approximation algorithms that require less computation and satisfy a performance bound. One of the approximation algorithms is a three-timescale actor-critic algorithm while the other is a two-timescale a...

متن کامل

An Actor/Critic Algorithm that is Equivalent to Q-Learning

We prove the convergence of an actor/critic algorithm that is equivalent to Q-learning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using c...

متن کامل

G Uide a Ctor - C Ritic for C Ontinuous C Ontrol

Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only use values or gradients of the critic to update the policy parameter. In this paper, we propose a novel actor-critic method called the guide actor-critic (GA...

متن کامل

OnActor-Critic Algorithms

In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a su...

متن کامل

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the polic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.00948  شماره 

صفحات  -

تاریخ انتشار 2017